Permission granted to reproduce this specification in complete and unaltered form. Excerpts may be printed with the following notice: "excerpted from the PNG (Portable Network Graphics) specification by Thomas Boutell." No notice is required in software that follows this specification; notice is only required when reproducing or excerpting from the specification itself.
The author wishes to acknowledge the contributions of the New Graphics Format mailing list and the readers of comp.graphics. (Mr. Boutell is solely responsible for errors of fact or design in the PNG specification, however.)
This is the sixth draft of the PNG (formerly "PBF") specification discussion document, replacing all previous drafts. There are several significant changes from the previous drafts.
The PNG format is intended to provide a portable, legally unencumbered, simple, lossless, streaming-capable, well-compressed, well-specified standard for bitmapped image files which gives new features to the end user at minimal cost to the developer.
It has been asked why the PNG format is not simply an extension of the GIF format. The short answer is that the GIF format is embroiled in legal disputes, does not support 24-bit images and lacks the option of an alpha channel.
It has been asked why the PNG format is not TIFF, or a subset of TIFF. The answer is that TIFF does not support a compression scheme that is not legally encumbered, and that a subset of TIFF would simply frustrate users making the reasonable assumption that a file saved as TIFF from Software XYZ will load into a program supporting our flavor of TIFF. Implementing full TIFF would violate the simplicity constraint.
It has been asked why the PNG format is not IFF, or a sub- or superset of IFF. The same concern applies as with TIFF: users with software that purports to generate IFF files will not be pleased when those files do not load in programs supporting the new specification. In addition, the IFF specification has rarely been accurately implemented and there is considerable disagreement among implementations. The IFF file structure could be used, but was not designed with streaming applications in mind; there are workarounds for this, but they are not widely implemented.
It has been asked why PNG does not include lossy compression. The answer is that JPEG already does an excellent job of lossy compression, and there is no reason to repeat that effort. Different tools, different jobs.
It has been asked why PNG uses network byte order. We have selected one byte ordering and used it consistently. Which order in particular is of little relevance, but network byte order has the advantage that routines to convert to and from it are already available on any platform that supports TCP/IP networking, including all PC platforms.
It has been asked why PNG does not directly support multiple images. It is expected that a metaformat will be created which permits multiple images and uses PNG-like data streams internally, with certain minimal alterations, such as the optional omission of palettes. In such a metaformat, the identifying bytes at the beginning must NOT be the same as for PNG.
PNG has been expressly designed not to be completely dependent on a single compression technique. Although inflate/deflate compression is mentioned in this document, PNG would still exist without it.
PNG supports an alpha channel as well as the transparency-index approach used in GIF. An alpha channel is much more flexible than a transparency index, whereas a transparency index compresses more efficiently.
All integers which are not 1 byte integers will be in network byte order, which is to say the most significant byte comes first, and the less significant bytes in descending order of significance (simply MSB LSB for two-byte integers, B3 B2 B1 B0 for 4-byte integers). References to bit 7 refer to the highest bit (128) of a byte; references to bit 0 refer to the lowest bit (1) of a byte.
All color values range from zero (black) to most intense at the maximum value. The AGMA chunk specifies the gamma response of the source device, and viewers are strongly encouraged to properly compensate.
Non-square pixels can be represented, but viewers are not required to account for them; see the APHY chunk.
The first six bytes always contain the following values:
137 08 80 78 71 26
The first two bytes distinguish the file on systems that expect the first two bytes to identify the file, but also backspace to erase the first nonsense character, making the following text visible. The next three bytes are the ASCII values of the letters "P", "N", and "G". The last byte is a control-Z character, permitting display to stop elegantly on DOS systems if the TYPE command is used to display the file.
The remainder of the file consists of a series of chunks, where each chunk consists of a 4-byte chunk type, 4-byte, UNSIGNED length (not including itself or the chunk type), and the data bytes appropriate to that chunk, if any. Note that this provides for a chunk to be skipped even if the implementation does not recognize that particular chunk type. The last chunk should always be an EOF chunk.
The four-byte chunk type should consist entirely of uppercase ASCII letters, with the following exceptions:
Spaces (ascii 32) are permitted at the end in order to pad out to four bytes.
Lowercase letters are permitted if the chunk is proprietary (see below).
IMPORTANT:
Even though chunk lengths are unsigned, chunks should not exceed (2^31)-1 in size, in order to accommodate languages which do not accommodate 4-byte unsigned integers well. (1- and 2-byte unsigned integers can be accommodated by using the next larger size of integer in such languages.)
Note also that the same chunk type can appear more than once if necessary, but only if so specified in the description of the chunk. This is sometimes necessary in order to implement streaming encoders.
The chunk-ordering mechanism present in the first two drafts has been dropped. Instead, rules regarding chunk order are stated in the description of each chunk.
Chunks which are not strictly necessary in order to meaningfully display the contents of the file are known as "ancillary" chunks, and their names must begin with a capital "A" character.
Chunks which are critical to the successful display of the file's contents begin with any other letter.
Critical chunks are necessary in order to properly display the contents of the file. If an implementation encounters a critical chunk type it does not know how to handle, it must indicate this to the user and not display the contents of the file. The image header chunk (HEAD) is an example of a critical chunk.
A hypothetical vector-graphics chunk would also be a necessary chunk, since without rendering it the image would appear to be blank, or would contain a background bitmap with no other information.
Ancillary chunks are ancillary information that enhances the image in some fashion, but without which the image can still be successfully displayed. Examples are the comment and copyright chunks.
If you want others outside your organization to understand a chunk type that you invent, CONTACT THE AUTHOR OF THE PNG SPECIFICATION (boutell@netcom.com) and specify the format of the chunk's data and your preferred chunk type. The author will assign a permanent, unique chunk type. The chunk type will be publicly listed in an appendix of extended chunk types which can be optionally implemented. In the event that Mr. Boutell is unable to maintain the specification, the task will be passed on to a qualified volunteer.
If you do not require or desire that others outside your organization understand the chunk type, you may use a chunk name containing at least one lowercase character. For ancillary chunk types, begin the chunk name with a capital 'A' character. Chunk types containing lowercase letters will never be assigned in the public specification. Please note that if you want to use these chunks for information that is not essential to view the image, and have any desire whatsoever that others not using your internal viewer software be able to view the image, you should use an ancillary chunk type rather than a critical chunk type (that is, the chunk type should begin with 'A'). Also note that others may use the same proprietary prefixes, so it would be advantageous to keep additional identifying information at the beginning of the chunk.
All PNG implementations must accept the following chunk types in order to be considered PNG-compliant. All implementations must understand and successfully render the critical chunks below. Standalone image viewers should also be capable of displaying the ancillary chunks below, such as the copyright notice, but this is not necessary for applications in which many images may be displayed at once (ie, WWW browsers).
Chunk Type Description HEAD Bitmapped image header This chunk must appear FIRST if the file contains a bitmapped image. Width: 4 bytes Height: 4 bytes Bit depth: 1 byte Color type: 1 byte Compression type: 1 byte Interlace type: 1 byte Width and height are 4-byte integers. Zero is an invalid value. The maximum for both is (2^31)-1 in order to accommodate languages which have difficulty with unsigned 4-byte values. Bit depth is a single-byte integer. Valid values that software must support are 1, 2, 4, 8, and 16. (Note that bit depths of 16 are easily supported on 8-bit display hardware by dropping the least significant byte.) Color type is a single-byte integer. Valid values are 1, 2, 3 and 4. Color type determines the interpretation of the image data. Color Type Valid Bit Depths Interpretation 1 1,2,4,8 Each pixel value is a palette index; a palette chunk will appear 2 1,2,4,8,16 Each pixel value is a grayscale level, where the largest value is white, and zero is black 3 8,16 Each pixel value is a three-value series: red (0 = black, max = red), green (0 = black, max = green), blue (0 = black, max = blue) 4 8,16 Each pixel value is a four-value series: red (0 = black, max = red), green (0 = black, max = green), blue (0 = black, max = blue), alpha (0 = transparent, max = opaque) Compression type indicates the compression scheme which will be used to compress the image data. This draft proposes use of the inflate/deflate compression scheme, an LZ77 derivative which is used in zip, gzip, pkzip and related programs, because extensive research has been done supporting its legality. Inflate and deflate code is available in the zip/unzip packages with a very permissive license (yes, permissive enough for commercial purposes, see those packages for details). At present, only compression type 0 (inflate/deflate compression with a 32K sliding window) is defined. At present, all standard PNG images will be compressed using this scheme. Interlace Type At present, there are two legal values for interlace type: 0 (no interlace) or 1 (line-wise interlace). With interlace type 0, rows are laid out continuously from top to bottom. With interlace type 1, rows are stored in the following order: Every eighth row, starting from row 0 Every eighth row, starting from row 4 Every fourth row, starting from row 2 Every second row, starting from row 1 The purpose of this feature is to allow images to "fade in" in a simple fashion that does minimal damage to compression efficiency, although the file size is slightly expanded on average. Other interlace types have been proposed, and will replace this scheme in the final proposal if the gain in visual quality is sufficient to outweigh any compression penalties. AGMA Gamma Correction Gamma correction factor: 2 bytes The gamma correction chunk specifies the gamma of the device which created the image, and for which the color values are intended. If the encoder does not know the gamma value, it should not write a gamma chunk; the absence of a gamma chunk indicates the gamma is unknown. If the gamma chunk does appear, it must precede the PLTE chunk. If it is possible for the encoder to determine the gamma, or to make a strong guess based on the hardware on which it runs, then the encoder is strongly encouraged to output the AGMA chunk. The gamma function determines the true response of the video display to a given level, assuming that input levels have been normalized to a range between 0.0 and 1.0: brightness = inputLevel ^ gamma A value of 1000 is equivalent to a gamma of 1.0, a value of 2000 to a gamma of 2.0, and so on (divide by 1000.0). Thus, when writing an image display program, if the display hardware has a gamma value of 2.0 (2000), and the gamma specified in the gamma correction chunk for a particular image is 3.0 (3000), then color and grayscale levels should ideally be normalized to a range between 0.0 and 1.0, then converted according to the following function: nativeLevel = inputLevel ^ (inputGamma / nativeGamma) Where inputLevel is the level specified for that pixel in the PNG file, inputGamma is the gamma specified in the PNG file, and nativeGamma is the gamma of the actual display to be used. In practice, it is often difficult to determine the gamma of the actual display. It is common to assume a gamma of 2.2 (or 1.0, on hardware for which this value is common) and allow the user to modify this value at their option. Also note that it is not difficult to calculate a gamma conversion table; it is *not* necessary to perform transcendental math for every pixel! Although viewers are strongly encouraged to implement gamma correction, in some cases speed may be a concern. In these cases, viewers are encouraged to provide gamma correction tables for gamma values of 1.0 and 2.2, and to use the table closest to the gamma indicated in the file. PLTE Palette This chunk must appear for color type 1, and may appear for color types 3 and 4. If this chunk does appear, it must precede the first IDAT chunk. In the case of color types 3 and 4, the palette chunk is optional, and provides a recommended set of from 1 to 256 colors to which the true-color image should be quantized if the display hardware cannot display truecolor directly. If it is not present, the viewer must select colors on its own, but it is most efficient for this to be done once by the encoder. The number of palette entries varies from 1 to 256. For chunk type 1, the number of entries should not exceed the range that can be represented by the bit depth (for example, 2^4 = 16 for a bit depth of 4). Note that this does NOT mean that there have to be a full 16 entries. The length of the chunk is used to determine the number of entries. For color type 1, each palette entry consists of a three-byte series: red (0 = black, 255 = red), green (0 = black, 255 = green), blue (0 = black, 255 = blue), Image creation programs are strongly encouraged to place colors which the artist or algorithm regards as important first in the palette, when such information is available, in order to allow display hardware with a limited supply of colors to make intelligent compromises. For color types 3 and 4, in which the palette is optional and only a suggested quantization, the same exact format is used, again with 3 bytes per palette entry: red (0 = black, 255 = red), green (0 = black, 255 = green), blue (0 = black, 255 = blue) Note that the palette uses 8 bits (1 byte) per value regardless of the image bit depth specification. In particular, the palette is 8 bits deep even when it is a suggested quantization of a 16-bit truecolor image. ATNS Transparency. Transparency is a simple alternative to the full truecolor alpha channel which does not compromise compression. For color type 1: Transparent index into palette (1 byte, range: 0 - (size of palette-1) ) Any value outside the size of the palette is an error. Note that the size of the palette is determined by the size of the palette chunk (and thus the number of three-byte entries in it), and not by the bit depth. For color type 2: Transparent gray level (2 bytes, range: 0 - (2^bitdepth - 1)) For color type 3: Transparent RGB color (6 bytes, 2 bytes for red, green and blue components, range for each: 0 - (2^bitdepth - 1)) The transparency chunk, when present, specifies a specific palette entry, grayscale level or RGB color which should be regarded as transparent. Although transparency is not as elegant as the full alpha channel of color type 4, transparency does not adversely affect the compression of the image. When present, the ATNS chunk must precede the first IDAT chunk, and follow the PLTE chunk, if any. ABGD Background color. When displaying the image in a stand-alone viewer, it is useful to specify the background color against which the image is intended to appear. For color type 1: Background index into palette (1 byte, range: 0 - (size of palette-1) ) For color type 2: Background gray level (2 bytes, range: 0 - (2^bitdepth - 1)) For color type 3: Background RGB color (6 bytes, 2 bytes for red, green and blue components, range for each: 0 - (2^bitdepth - 1)) When present, the ABGD chunk must precede the first IDAT chunk, and follow the PLTE chunk, if any. ACPY Copyright notice. The notice will consist of ISO 8859-1 (LATIN-1) text and will not be null-terminated. New lines should be denoted by a single line feed (10 decimal). If this chunk appears, it must appear prior to the IDAT chunk. ACMT Comment. The comment will consist of ISO 8859-1 (LATIN-1) text and will not be null-terminated. New lines should be denoted by a single line feed (10 decimal). If this chunk appears, it must appear prior to the IDAT chunk. Several ACMT chunks may appear, and are distinct comments, not a continuous text. APHY Physical pixel dimensions. 4 bytes: pixels per unit, X axis (unsigned integer) 4 bytes: pixels per unit, Y axis (unsigned integer) 1 byte: unit specifier The following values are legal for the unit specifier: 0: units unknown (aspect ratio only) 1: unit is the decimeter (10 centimeters) 2: unit is the foot (12 inches) Large units are employed to ensure sufficient resolution. If this ancillary chunk is not present, pixels are assumed to be square, and the physical size of each pixel is unknown. (Conversion note: one inch is equal to 2.54 centimeters.) APRI Physical image location for printing purposes. 4 bytes: image position in microns (X axis) 4 bytes: image position in microns (Y axis) The position on a printed page at which the image should be output when printed alone. ATME Time of image creation. 4 bytes: time in seconds since the beginning of January 1st, 1970, Greenwich Mean Time. ATMB Thumbnail image. This chunk contains an additional, complete PNG data stream, from the six-character header to the EOF chunk, with the constraint that the enclosed stream should not include another ATMB chunk. The PNG stream should describe a much smaller version of the same image, suitable for icon or catalog use. If the ATMB chunk appears, it should appear prior to the IDAT chunk. Since the ATMB chunk is a complete PNG stream in its own right, it can easily be extracted and transmitted independently by packages such as web servers, and it can also be palette-based even if the complete image is a truecolor image. Note that the entire thumbnail must fit in a single ATMB chunk; this is intentional as thumbnails are intended to be much smaller than the full image. IDAT Image data. The image data will be compressed using the compression scheme indicated by the compression type field of the HEAD chunk. IMPORTANT: the compressed image data is the concatenation of the contents of ALL the IDAT chunks. (If there are multiple IDAT chunks, they will always appear sequentially.) Viewers must be able to interpret such chunks. (Simply speaking, the viewer knows it is not finished until it has read as many pixels as are indicated by the image dimensions in the HEAD chunk.) This rule exists to permit encoders to work in a fixed amount of memory by outputting multiple chunks. The following text describes the uncompressed data stream which will be fed to the compressor or received from the decompressor. Pixels are always laid out left to right in each row, and rows are arranged from top to bottom, except as modified by the interlace type field of the HEAD chunk. Color types 1 and 2 For color type 1, each pixel value is an index into the palette indicating which color in the palette should be displayed at that location. For color type 2 (grayscale), each pixel value is a grayscale level, where the maximum value representable by the bit depth is white. For 1-bit images, each horizontal line of pixels is represented by a stream of bits, in which bit 7 (128) is the leftmost pixel in the byte and bit 0 (1) is the rightmost. Consecutive lines may share bits if the pixels in the line do not fit evenly into bytes. That is, if the last pixel of the line falls in bit 4 of a byte, the first pixel of the next line is stored in bit 3 of the same byte. For 2-bit images, the same scheme is followed, except that each pixel is represented by a 2-bit portion of a byte, with the leftmost bit being most significant. For instance, the first pixel of the line is represented by bits 7 (128) and 6 (64) of the byte. Consecutive lines may share bytes. For 4-bit images, the same scheme is followed, except that each pixel is represented by a 4-bit portion of a byte, with the leftmost bit being most significant. For instance, the first pixel of the line is represented by bits 7 (128), 6 (64), 5 (32) and 4 (16) of the byte. Consecutive lines may share bytes. For 8-bit images, each pixel is represented by a single byte. For 16-bit grayscale images (color type 2), each pixel is represented by a two-byte unsigned integer. IMPORTANT: For 8- and 16-bit grayscale images (color type 2, bit depth of 8 or 16), the values are next input to the CROSS filter (for non-interlaced images; see below) or to the SUB filter (for interlaced images; see below) in order to improve compression before being input to the compressor itself. This step is NOT employed for palette color images (color type 1). Color types 3 and 4 For color type 3, each pixel is represented by a red value, a green value, and a blue value, 8 or 16 bits apiece respectively depending on the bit depth (8 or 16). For color type 4, an additional alpha (opacity) value of the same depth is added for each pixel. IMPORTANT: The values are next input to the CROSS filter (for non-interlaced images; see below) or to the SUB filter (for interlaced images; see below) in order to improve compression before being input to the compressor itself. EOF End of File CRC (4 bytes) The EOF chunk must appear last in the PNG file. Note that the letters EOF are followed by a space (decimal 32). The EOF chunk contains a 4-byte CRC (Cyclical Redundancy Check) of all preceding bytes in the file, including the identifying header, all preceding chunks, and the EOF chunk name and length field. The CRC is NOT optional. If the CRC does not match that calculated by the viewer, the viewer may elect to attempt to display the contents of the file, but must warn the user that the checksum is incorrect. This mechanism helps to detect images that have been improperly transmitted.
See the zip/unzip package, which includes source code for both purposes in the files inflate.c and deflate.c, with a very permissive license. Documentation of the compression scheme is also available; see the zip/unzip package for references. (zip/unzip and pkzip are compatible but not identical. pkzip is commercial software.)
A formal, detailed specification of inflate and deflate will be included in the final standard, and is being written at this time. The formal specification will be compatible with the format defined by the inflate.c/deflate.c code.
The sub filter is used to improve compression on interlaced truecolor images (color types 3 and 4) and 8- and 16-bit grayscale images (color type 2).
For each pixel, output the difference between that pixel and the previous pixel, modulo the range possible in that bit depth. For instance, for a bit depth of 8, if the previous pixel was 16 and the current pixel is 64, store 48. If the previous pixel was 255 and the current pixel is 20, store 21. Note that unsigned addition is used. IMPORTANT: At the start of each line, consider the previous pixel value to be zero.
The cross filter is used to improve compression on non-interlaced truecolor images (color types 3 and 4) and 8- and 16-bit grayscale images (color type 2). Cross is similar to sub, but takes the previous line into account (highly effective as long as the image is not interlaced).
Output the following value, using unsigned modulo arithmetic and integers of the size appropriate to the bit depth (8 or 16):
Pixel[x][y] - Pixel[x-1][y] - Pixel[x][y-1] + Pixel[x-1][y-1]
for each channel (red, green, blue, and sometimes alpha) of each pixel.
On the first row of the image, the previous row is considered to have contained only zeroes. On the first pixel of each row, the previous pixel is considered to have contained only zeroes.
To reverse the effect of the cross filter after decompression, output the following value:
CrossedValue + Pixel[x-1][y] + Pixel[x][y-1] - Pixel[x-1][y-1]
storing the result as the value of the previous pixel for use in uncrossing subsequent pixels.
Standalone image viewers can ignore the alpha channel, provided that they properly skip over it in order to be in the right position to read the next pixel. However, if the background color has been set with the ABGD chunk, the alpha channel can be meaningfully interpreted with respect to it even in a standalone image viewer.
World Wide Web browsers and the like should regard any pixel with an alpha channel value of zero as transparent (the pixel should be given the background color of the browser), and any pixel with the maximum alpha channel value for that bit depth as opaque (not blending with the background at all).
Viewers which are not in a position to smoothly combine foreground and background colors should regard any nonzero alpha channel value as fully opaque (fully foreground color).
For applications that do not require a full alpha channel, or cannot afford the price in compression efficiency, the ATNS transparency chunk is also available.
End of PNG Specification